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Abstract 

In this paper, we show a way to exploit sparsity in the problem 
data in a primal-dual potential reduction method for solving a class 
of semidefinite programs. When the problem data is sparse, the dual 
variable is also sparse, but the primal one is not. To avoid work- 
ing with the dense primal variable, we apply Fukuda et al.'s theory 
of partial matrix completion and work with partial matrices instead. 
The other place in the algorithm where sparsity should be exploited 
is in the computation of the search direction, where the gradient and 
the Hessian-matrix product of the primal and dual barrier functions 
must be computed in every iteration. By using an idea from auto- 
matic differentiation in backward mode, both the gradient and the 
Hessian-matrix product can be computed in time proportional to the 
time needed to compute the barrier functions of sparse variables itself. 
Moreover, the high space complexity that is normally associated with 
the use of automatic differentiation in backward mode can be avoided 
in this case. In addition, we suggest a technique to efficiently compute 
the determinant of the positive definite matrix completion that is re- 
quired to compute primal search directions. The method of obtaining 
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one of the primal search directions that minimizes the number of the 
evaluations of the determinant of the positive definite completion is 
also proposed. We then implement the algorithm and test it on the 
problem of finding the maximum cut of a graph. 

1 Introduction 

Let S n denote the space ofnxn symmetric matrices. Given A v e S n ,p = 
1, 2, . . . , m, b G W 71 , and C G S n , semidefmite programming (SDP) problems 
in standard form are given as 

minxes™ C • X 

subject to: A p • X = b p ,p = 1, . . . , m, (1) 
X > 0. 

The notation C • X represents the inner product of C and X, which is equal 
to CijXij. The constraint X > denotes that X must be symmetric 
positive semidefmite, which means w T Xw > for any w e M. n . Similar to 
linear programming, semidefmite programs have associated dual problems. 
The semidefmite program (JTJ is said to be in primal form. Its dual, which is 
also a SDP problem, is 

max y6R m !5eS n b T y 

subject to: YJ^=i Vv A v + S = C, (2) 
S > 0, 

where S is called the dual slack matrix. 

Semidefmite programming has many applications in many fields (see |T3] 
for a list of applications). It can also be regarded as an extension of linear 
programming. As a result, various methods for solving linear programming 
have been extended to solve SDP. In particular, interior-point methods were 
first extended to SDP by Alizadeh pQ and Nesterov and Nemirovskii [To] 
independently. The most effective interior-point methods are primal-dual 
approaches that use information from both primal and dual programs. Most 
primal-dual interior-point algorithms for SDP proposed fall into two cate- 
gories: path-following and potential reduction methods. Our algorithm is 
of the potential reduction kind, in which a potential function is denned and 
each iterate reduces the potential by at least a constant amount. It is based 
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on the primal-dual potential reduction method proposed by Nesterov and 
Nemirovskii in their book [To] . 

This paper focuses on the sparse case of SDP, where the data matrices C 
and ApS consist of mostly zero entries. Because most problem data arising 
in practice are sparse, it is vital for an SDP solver to take advantage of the 
sparsity and avoid unnecessary computation on zero entries. The obstacle 
that prevents effective exploitation of sparsity in an SDP algorithm is that 
the primal matrix variable is dense regardless of the sparsity of the data. To 
avoid this problem, Benson et al. proposed a pure dual interior-point method 
for sparse case 0. Later, Fukuda et al. proposed a primal-dual algorithm 
using partial matrix and matrix completion theory to avoid the dense primal 
matrix Our algorithms follow Fukuda et al.'s suggestion and uses partial 
primal matrix to take advantage of sparsity in the primal-dual framework. 
In contrast, a recent work by Burer is also built upon Fukuda et al.'s idea of 
using partial matrix but his algorithm is a primal-dual path-following method 
based on a new search direction [3]. 

In Nesterov and Nemirovskii's primal-dual potential reduction method, 
the computation of the search directions requires the gradient and Hessian- 
matrix product of the barrier functions. The currently common way to com- 
pute this gradient is not efficient in some sparse case. Our algorithm applies 
the idea from automatic differentiation in reverse mode to compute gradient 
and Hessian-matrix product in a more efficient manner for the sparse cases. 
Additionally, we suggest a technique that evaluates the barrier function value 
of a partial matrix efficiently in certain cases and an alternative way to com- 
pute the search directions when such evaluation is expensive. When the data 
matrices' aggregated sparsity pattern forms a planar graph, our algorithm 
manages to reduce the time complexity to 0(n 5 ^ 2 ) operations and the space 
complexity to O(nlogn) per SDP iterate. This is a significant improvement 
from the 0(n 3 ) time complexity and the 0(n 2 ) space complexity per iterate 
of a typical SDP solver for planar case. 

We start with review of necessary material on semidefinite programming, 
Nesterov and Nemirovskii's primal-dual potential reduction method, and 
sparse matrix computation in Section |21 El and EJ respectively. Section 
covers the detail of our algorithm's computation of the dual Newton direc- 
tion including how an idea from automatic differentiation in reverse mode 
is used to evaluate the gradient and Hessian-matrix product efficiently. The 
computation of the primal projected Newton direction as well as the efficient 
method in computing the determinant of the positive definite completion of a 
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partial matrix are discussed in Section |H1 Finally, the results of our algorithm 
on test instances of the problem of finding maximum cut are in Section |SJ 



2 Preliminaries on semidefinite programming 

We refer to A and (y, S) as feasible solutions if they satisfy the constraints 
in (JU) and (0), respectively. A strictly feasible solution is a feasible solution 
such that X (or S) are symmetric positive definite. A matrix A is symmetric 
positive definite (A > 0) if w T Aw > for any w £ ~R n \ {0}. Problem ([T]) 
(resp. (0)) is strictly feasible if it contains a strictly feasible solution. 

Let V (resp. V) denote the set of feasible solutions of (JJJ (resp. (J2J)), and 
V (resp. X>') denote the set of strictly feasible solutions of ((TJ) (resp. (J2J)). 
The duality gap, which is the difference between primal and dual objective 
functions 



is nonnegative at any feasible solution Under the assumption that (JJJ 
and (J2J) are strictly feasible and bounded, (X*,y*,S*) solves flJJ) and © if 
and only if A* £ P, (y*, S*) £ £>, and S* • X* = 0. 

3 Nesterov and Nemirovskii's Primal-dual Po- 
tential Reduction Method 

Primal-dual potential reduction methods solve SDP programs by minimizing 
the potential function 

<f>(X,S) = (n + 7 V / ^)ln(S«A)-lndetA-lndetS:PxI?^K, (3) 

where 7 > is a given constant parameter of the algorithm. Any sequence 
of iterates in which the potential <p(X, S) tends to negative infinity con- 
verges to (or at least has an accumulation point at) a strictly feasible solution 
(A*, y*, S*) such that S* • X* = and hence is optimal |17j . 





SmX, 
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Most primal-dual potential reduction methods begin at a strictly feasible 
iterate and compute the next strictly feasible iterate while guaranteeing at 
least constant decrease in each iteration. The process is continued until an 
iterate with duality gap less than or equal to e is found, where e > is a given 
tolerance. Nesterov and Nemirovskii proposed a polynomial-time primal-dual 
potential reduction method for more general convex programming problems 
in their book ^S] , which is the basis of our algorithm. We now proceed to 
explain the "large step" version of their method as applied to the SDP (0) 
and (J2J). We call this method the "Decoupled Primal-Dual" algorithm (DPD) 
since the computation of primal and dual directions are less directly coupled 
than in other primal-dual interior point methods. Given a current strictly 
feasible iterate (X, S), compute the next iterate (X', S') as follows: 

(i) Let 

o • X 

Define the function 

V (W) = -lndetiy + Mm (W - X), 

where W G S n . 

(ii) Find the projected Newton direction N of v onto V at X, 



N = argmm H {v'(X)»H+-(v"(X)H)»H : A p »H = 0,p = 1, 2, . . . , m}. 

(iii) Let A be 

(iv) We then have 



A = [(v"(X)N) • N] 1/2 . 



N 



1 + A' 



and 
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ASi = [-Vlndet(JT) - (V 2 lndet(X))iV] - S 

[n + 7V^) 

as a primal and dual directions respectively. 

By swapping the roles of primal and dual in (i)-(iv), another pair of 
directions can be achieved as follow: 

(v) Let 

M= {n + 1 f ) X. 

Define the function 



v(W) = - IndetW + M • (W - S). 
(vi) Find the projected Newton direction N of v onto V at S, 

N = argmm H {v{S)»H+-{v'{S)H)»H : H = J^ z p A p> for some z e Rm }- 



P =i 



(vii) Let A be 

(viii) Then 



~\ = [(v"(S)N)*N} 1 / 2 . 



AS 2 = » 
1 + A 



and 



AX 2 = - S ' X - [-V\ndet(S) - (V 2 In det(S)W] - X 
are another dual and primal directions, respectively. 
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(ix) Find 



(/i*, h 2 , fc*, k *) = &rgmm huh2MM <j){X + hAXi + h 2 AX 2 , S + hAS! + k 2 AS 2 ) 
subject to: X + /ixAXt + /j 2 AI 2 > 0, and 

5 + fc 1 AS'i + A; 2 A,S 2 > 0. 

(x) Finally, set 

X' = X + hlAXi + h* 2 AX 2 , 
S' = S + klAS! + k* 2 AS 2 . 

Nesterov and Nemirovskii also showed that DPD algorithm achieves a 
constant reduction in at each iteration even when only two directions AXi 
and AS i are considered and potential minimization in step (ix) is not per- 
formed (that is, when fixing h\ — k* — 1 and h 2 = k 2 = 0). 

4 Sparse matrix computation 

A sparse matrix is a matrix with few nonzero entries. Many problems' data 
encountered in practice are sparse. By exploiting their structures, time and 
space required to perform operations on them can be greatly reduced. Many 
important applications of SDP, such as the problem of finding maximum cut, 
usually have sparse data, too. For this reason, we consider the sparse case in 
this paper. 

To be able to discuss the exploitation of sparse data in SDP, background 
on chordal graph theory is needed, which is addressed in the following section. 

4.1 Chordal graphs 

Let G = (V, E) be a simple undirected graph. A clique of G is a complete 
induced subgraph of G. A clique C = (V, E') is maximal if its vertex set V 
is not a proper subset of another clique. Let Adj(v) = {u G V : {u,v} G E} 
denote the set of all vertices adjacent to a vertex v G V. A vertex v is called 
simplicial if all of its adjacent vertices Adj(v) induce a clique. 
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For any cycle of G, a chord is an edge joining two non-consecutive vertices 
of the cycle. Graph G is said to be chordal if each of its cycles of length 4 or 
greater has a chord. One fundamental property of a chordal graph is that it 
has a simplicial vertex, say v± and that the subgraph induced by V \ {v\} is 
again chordal, which therefore has a simplicial vertex, say v 2 - By repeating 
this process, we can construct a perfect elimination ordering of the vertices, 
say (vi, v 2 , ■ ■ ■ , v n ), such that Adj{vi) fl {v i+ i, v i+2 , ■ ■ ■ , v n } induces a clique 
for each % — 1, 2, . . . , n — 1. It was shown by Fulkerson and Gross that a 
graph is chordal if and only if it has a perfect elimination ordering jB] . 

Given a perfect elimination ordering (t>i,t>2, • • • ,v n ) of a chordal graph, 
its maximal cliques can be enumerated easily. A maximal clique containing 
the simplicial vertex v\ is given by V\ U Adj(vi) and is unique. A maximal 
clique not containing v i is a maximal clique of the chordal subgraph induced 
on V \ {fi}. Therefore, by repeating this reasoning, the maximal cliques 
C r C V, r — 1, 2, . . . , I, are given by 

C r = {v { } U (Adj(vi) n {v i+ i, V i+2 , • • • , v n }) 

for i = min{j : Vj G C r }, that is, the maximal members of {{fi} U (Adj(vi) fl 
{v i+1 , v i+2 , • • • , v n }) : i = 1, 2, . . . , n}. 

One property of the sequence of maximal cliques is that it can be rein- 
dexed such that for any C r , r = 1, 2, — 1, there exists a C s , s > r + 1, 
such that 

c r n (o+i u c r+2 u • • • u Ci) c c s . (4) 

Such property is called the running intersection property. 

There is a well-known relationships between chordal graph and Cholesky 
factorization of sparse symmetric positive definite matrices. Given a symmet- 
ric positive definite matrix X, its Cholesky factor L is a lower-triangular ma- 
trix such that X = LL T . The sparsity pattern of X, which is defined as the set 
of row/column indices of nonzero entries of A, is often represented as a graph 
G = (V, E), where V = {1, 2, . . . , n} and E = {{i,j} : X {j ^0,i^ j}. Simi- 
larly, the sparsity pattern of L can be represented by the graph G 1 = (V, F), 
where F = {{i,j} : Lij ^ 0,i > j}. Under no numerical cancellations 
assumption, which means no zero entries are resulted from arithmetic op- 
erations on nonzero values, it is seen that F D E, with F having possibly 
additional fill-ins. In addition, G' = (V, F) is chordal and is said to be a 
chordal extension of G = (V, E). 
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The number of fill-ins in the Cholesky factorization depends on the or- 
dering of the row/column indices. The question of finding the reordering of 
the row/column indices to yield fewest fill-ins is NP-complete. In the best 
case when G is chordal, a perfect elimination ordering yields the Cholesky 
factor with no fill-ins. 

4.2 Partial symmetric matrix and positive definite ma- 
trix completion 

It is a well-known fact that in the course of SDP algorithms, the primal 
variable X usually is dense even if the data are sparse while the dual variables 
y and S stay sparse. To avoid working with a dense primal variable, Fukuda 
et al. suggested the use of a partial symmetric matrix for the primal variable 
in SDP algorithms jS]. Let V = {1, 2, ... , n}. Define the aggregate sparsity 
pattern E of the data to be 

E = G V x V : Q j ^ or [A^ ^ for some p <G {0, 1, . . . , m}}. 

Observing (JTJ), we see that the values of the objective function and con- 
straint linear functions only depend the entries of X corresponding to the 
nonzero entries of C and A p 's. The remaining entries of X affect only whether 
X is positive semi definite. In other words, if X and X' satisfy = X-p for 
any G E, then 

c»x = C*X' 
A P »X = A p »X',p= 1,2, ...,m. 

A partial symmetric matrix is a symmetric matrix in which not all of 
its entries are specified. A partial symmetric matrix X can be treated as a 
sparse matrix, having its unspecified entries regarded as having zero values. 
Hence, a sparsity graph G' = (V, E) can be used to represent the row/column 
indices of specified entries of X in the same manner as it is used to represent 
nonzero entries of a sparse matrix. Let S n (E, ?) denote the set ofnxn partial 
symmetric matrices with entries specified in F. We assume that all diagonal 
entries are also specified although there are no edges in G' representing them. 

A completion of a partial symmetric matrix X is a matrix X of the 
same size as X such that Xij = Xij for any {i,j} € F. A positive definite 
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completion of a partial symmetric matrix is a completion that is positive 
definite. The following theorem characterizes when a partial matrix has a 
positive definite completion. 

Theorem 4.1 (Grone et al. [HI Theorem 7]). Let G' = (V, F) be a 

chordal graph. Any partial symmetric matrix X e S n (F,l) satisfying the 
property that Xc r c T is symmetric positive definite for each r = 1,2,. . .,1, 
where {C r C V : r — 1, 2, . . . , 1} denote the family of maximal cliques of G' , 
can be completed to a positive definite matrix. 

4.3 Maximum-determinant positive definite matrix com- 
pletion 

The following result of Fukuda et al. 5j shows an efficient way to compute 
a certain positive definite matrix completion. Given a partial symmetric 
matrix X whose sparsity pattern G' = (V, F) is chordal, its unique positive 
definite completion that maximizes the determinant 

X = argmax x {det(X) : X is a positive definite completion of X} 

is shown to be 

PXP T = LlLl ■ ■ ■ Lj^DL^ ■ ■ ■ L 2 L U (5) 

where P is the permutation matrix such that (1,2,..., n) is the perfect elim- 
ination ordering for PXP T , L r (r = 1, 2, — 1) are sparse triangular 
matrices, and D is a positive definite block- diagonal matrix, both defined 
below Let (Ci, C*2, . . . , C{) be an ordering of maximal cliques of G' that 
enjoys the running intersection property Q. Define 

S r = C r \ (C r+ i U C r+2 U • • • U C,), r = 1, 2, . . . , /, 
u r = c r n (CV+i U C r+2 U • • • U Ci), r = 1, 2, . . . , I. 

The factors in (J3J) are given by 

{1 L _ i = 3, 

[XuluX Ur s r ] lf teU r ,jeS r , (6) 
0, otherwise 
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for r = 1, 2, — 1, and 



( D 



SiSi 



\ 



D 



Ds 2 s 2 



\ 



D s lSl j 



where 




(7) 



In addition, the unique determinant-maximizing positive definite completion 
X has the property that 



In other words, the inverse of the determinant-maximizing completion has 
the same sparsity pattern as that of the partial matrix. 

4.4 Using the maximum-determinant extension of X 



To exploit sparsity in the data matrices, our algorithm works in the space of 
partial matrix X for primal variable. When positive definite completion of 
X is needed, the maximum-determinant completion is used. We choose this 
particular completion because it preserves the self-concordance of the barrier 
function, which follows directly from Proposition 5.1.5 of ^3]. This property 
guarantees that our algorithm converges to an optimal solution. 

5 Computation of the dual Newton direction 

The single most computationally-intensive step in any potential reduction 
method is the computation of search directions. In Nesterov and Nemirovskii's 
method described in Section El this computation occurs in steps (ii), (iv), 
(vi), and (viii). For this reason, minimizing computation in these steps are 
emphasized in our algorithm. We describe our algorithm to compute N (step 
(vi) of the algorithm), which is the most computationally-intensive step in 



(x- 1 ) 



0, (i,j)£F. 



(8) 



in DPD 
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the computation of AS 2 , in this section. Conjugate gradient is used together 
with an idea from automatic differentiation in reverse mode to compute N. 
The minimization problem in step (vi) is 

min^ -(VlndetS + M) mN- ±((V 2 lndet S)iV) mN 

subject to: N = Y^=i z pA P) for some z E M. m . 

Replacing N with Y^=i z pA P in © yields 

m j mm 

min-(VlndetS' + M) m^z p A p - -((V 2 lndet S)^z p A p ) m^z p A p , 

p=l p=l p=l 

which is equivalent to solving the system 

m 

A(J2 z p(^ 2 In det S)A P ) = -A(V In det S — M) (10) 
P =i 

/ A t mW \ 



for z, where A.(W) 



. To rewrite (fTUj) into standard system 



\A m *W J 

of linear equations form, we first define the functions 

m 

/(u) = lndet(C-^M p A p ) 
P =i 

and 

h(u) = /(u) + M • (C - u p a p ~ S). 

P =i 

Then, the system (fTUjl is equivalent to 

(V 2 /i(y))z = -V/i(y). (11) 

Our algorithm uses conjugate gradient to solve (fTTj) to exploit the fact that 
(V 2 /i(y))z and V/i(y) can be computed efficiently. 
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5.1 Computing derivatives 

As described above, the conjugate gradient method when solving for N 
calls for the computation of (V 2 /i(y))z in each iteration and V/i(y) once. 
Note first that h(u) can be can be computed as follow: Cholesky factor- 
ize C — XlpLi u pA P = LL T , where L is a lower triangular matrix, compute 
mdet(C - Y^=i u p A p) = lndet(LL T ) = 21ndetL = 2£ i hiL i , an d finally 

h(u) = 2J2i^ n Li + M • (C — Y^=\ u v A v ~ s )- We derive the algorithm 
to evaluate V/i(u) from the above method of evaluating h(u) by imitating 
automatic differentiation (AD) in reverse mode, which is discussed in detail 
below. To evaluate (V 2 /i(u))z, notice that it is the derivative of the function 
g(u) = [h(u)] T z with respect to u. Hence, we can derive the algorithm to 
evaluate (V 2 /i(u))z from the algorithm to compute g(u), again by imitating 
AD in reverse mode. We emphasize that we do not suggest using AD to au- 
tomatically compute derivatives of h(u) given the algorithm to evaluate h(u) 
as we would not be able to control the space allocation of AD. Rather, we 
imitate how AD in reverse mode differentiate the algorithm to evaluate h(u), 
make additional changes to reduce space requirement (discussed below), and 
then hand-code the resulting algorithm. 

Automatic differentiation is a tool that receives a code that evaluates 
a function as its input and generates a new piece of code that computes 
the value of the first derivative of the same function at a given point in 
addition to evaluating the function. In essence, AD repeatedly applies the 
chain rule to the given code. There are two modes in AD, each representing 
a different approach in applying the chain rule. Forward mode differentiates 
each intermediate variable with respect to each input variable from top to 
bottom. Reverse mode, on the other hand, differentiates each output variable 
with respect to each intermediate variable from bottom up, hence the name 
reverse. Note that each entry in a matrix is treated individually. Therefore, 
one n x n input matrix is treated as n 2 input variables jH] . 

One mode is more suitable than the other in different situations. Complexity- 
wise, forward mode is more appealing when the number of input variables is 
less than the number of output variables while reverse mode is more appealing 
when the number of input variables is greater. Let 00(f) be the computation 
time of the given code, c be the number of input variables of the code, and 
d be the number of output variables. The code generated by forward mode 
computes the first derivative in time proportional to cu(f) while the one 
generated by reverse mode does so in time proportional to dw(f). However, 
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reverse mode has one additional disadvantage: the storage space required 
may be as large as time complexity of the original code, which can be much 
larger than the space complexity of the original code, because if a variable 
is updated many times throughout the evaluation of /, its values before and 
after each such update may be needed. Forward mode does not suffer from 
this problem because by taking derivatives from top to bottom, the old value 
of an intermediate variable is not needed after the variable is updated and 
therefore can be safely overwritten in the same storage space. The storage 
issue in reverse mode can be partially fixed by recomputing required values 
rather than storing them, but this approach may result in significant increase 
in computation time. 

For our problem, however, reverse mode can be applied to compute V/i(y) 
and (V 2 /i(y))z without increasing storage requirement. By performing re- 
verse mode AD by hand, it is seen that all of the intermediate variables can 
be overwritten safely and thus avoiding the need to store many versions of a 
variable. Therefore, our method of computing V/i(y) and (V 2 /i(y))z requires 
the same order of time and space complexity as the algorithm for evaluating 
the original function h(y). 

Analytically, it can be shown that V/(y) = ^.(S 1-1 ). From the definition 
of A(-), we see that only entries of S~ l in F, the chordal extension of the 
aggregated sparsity pattern E, need to be computed in order to compute 
V/(y) (and, consequently, V/t(y)). Erisman and Tinney showed a method of 
computing such entries of S -1 in the same order of time and space complexity 
as performing Cholesky factorization of 5* in 1975 |3]. Thus, their method can 
be used to compute V/i(y) in the same complexity as our proposed method. 
Nevertheless, our method proves useful as it can be extended to compute the 
Hessian-vector product efficiently 

This idea of imitating reverse AD is not limited to computing derivatives 
of h(u) . It can also be applied to compute the gradient of In det S with respect 
to entries in F of S, which is required in step (vi) and (viii) of our algorithm. 
The most computationally expensive part of evaluating In det S is to Cholesky 
factorize S, which is similar to the algorithm for evaluating h(u). Hence, 
their derivative codes are very similar, and all of the intermediate variables 
arising from performing reverse mode AD on In det S evaluation algorithm 
can be safely overwritten, too. The entries in F of (V 2 In det S)N required in 
step (viii) can also be computed using the same idea since (V 2 lndetS')A r = 
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We remark that our approach can be much more efficient than the obvious 
way of obtaining V/i(u) or the gradient of lndet S with respect to entries in 
F. The simple way of obtaining the entries in F of S~ l is to (i) compute the 
Cholesky factorization S = LL T and then (ii) compute the required entries 
of by using backward and forward substitution to solve linear systems 
of the form LL T Vi = for the required entries of Vi, the zth column of S~ x , 
where ej is the zth column of the identity matrix. When S is sparse, step (ii) 
may be much more computationally expensive than step (i). One example is 
when S is tridiagonal, in which case, step (i) requires only 0(n) operations 
while step (ii) requires 0(n) operations per entry of 5" -1 , which can result in 
a total of 0(n 2 ) operations if the number of nonzeros in W is 0(n). On the 
other hand, our algorithm would require only 0(n) operations in this case. 

Although it appears that the sparse-inverse algorithm has not been pre- 
viously used in semidefmite programming, it has been used elsewhere in the 
optimization literature. See, for example, Neumaier and Groeneveld [To] . 

6 Computation of the primal projected New- 
ton direction 

Following the discussion in Section EJ our algorithm works with a partial 
matrix X with specified entries in F, a chordal extension of the aggregated 
sparsity pattern E, for primal variable. For this reason, the computation 
of the primal search directions are more complicated than the dual ones de- 
scribed in previous section. Moreover, as we shall see below, the evaluation 
of (V 2 lndet X)P, where X is the maximum determinant positive definite 
completion of X and P is an arbitrary matrix, appears to be more expensive 
than performing Cholesky factorization. Consequently, the same algorithm 
used to compute the dual Newton direction as described in Section which 
involves evaluation of (V 2 lndetX)P in each iteration of the conjugate gra- 
dient, may not be efficient. Therefore in this section, we propose a different 
method for obtaining N that avoids excessive evaluation of (V 2 lndet A)P. 

Assume A -1 is known in addition to X (the detail on the computation 
of X~ l is addressed below in Section f6.2|) . Recall from (jHJ) that A -1 has 
sparsity pattern F and therefore is sparse. To compute for N, according to 
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step (ii), the problem under consideration is 



min^v -(VlndetX + M)«X-f((V 2 lndetX)X)«X (u) 

subject to: A v • N = 0,p = 1, 2, . . . , m. 

Note that VlndetX is X" 1 and (V 2 lndetX)X is X^NX' 1 . The KKT 
condition for the optimum solution to (|T2*j) is 

m 

X- X NX- X = X- 1 -M + Y^ W 

P =i 

or, equivalently, 

m 

N = X - XMX + ^2 \ P XA P X, (13) 
p=i 

where A p (p = 1, 2, . . . , m) is a scalar to be determined that enforces the 
condition A p • N = (p = 1,2,..., m). To determine A p 's, eliminate N from 
(Jinj) by taking inner product with A q (q = 1,2, ... ,m) on both sides and 
noting that A q • N = 0, yielding the linear system 

m 

A(J2 \XA P X) = A{XMX - X), (14) 
P =\ 

where A(-) is defined as in Section To rewrite (fT4^) as a standard system 
of linear equations form, define the function 

m 

q(u) = lndet(X _1 - ^UpAA. 

P =i 

The system (JTj| is therefore equivalent to 

(V 2 g(0))A = ^(XMX-X), (15) 

where A = (Ai, A 2 , . . . , A m ) T . Conjugate gradient is then used to solve the 
system (|15|) for A. After knowing A, we can now compute X from (|13j) . 

To solve for A p efficiently with conjugate gradient, it is important that 
(V 2 g(0))A and XMX are not expensive to evaluate. This is where X -1 
becomes useful. From the definition of A(-), we see that we do not need to 
know the entries outside F of the resulting matrices XMX. Also, the matrix 
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X l , unlike X, is not a partial matrix, but recall from Section EOl that X 1 
has the same sparsity pattern F as the partial matrix X. Moreover, XMX = 

((lndet W)M) a ^ (see appendix C of [T2]). Therefore, the entries 

in F of XMX can be computed using the idea of automatic differentiation 
in reverse mode in the same manner as computing (V 2 In det S)P, as detailed 
in Section IHTT1 The Hessian-vector product (V 2 g(0))A can also be handled in 
the same manner as (V 2 /i(y))z in the dual case. Lastly, the term X that is 
by itself in the quantity (X — XMX) of (fTK|) may be replaced by X safely as 
the entries outside F of X do not affect the equation after the inner product 
with A q is taken. 

6.1 Logarithm of determinant of positive definite com- 
pletion matrix 

Steps (i)-(ii) and (v)-(viii) of DPD can be performed using the techniques 
described in previous sections and the values of the matrices S, X, and X~ l . 
For step (ix), steepest descent method used to compute step size requires 
that the algorithm evaluates <j>{X + h x AX x + h 2 AX 2 , S + k 1 AS 1 + k 2 AS 2 ) 
for a current point (hi, h 2 , k x , k 2 ) to be able to decide when to terminate 
the steepest descent. But since we only have the partial matrices X, AX X , 
and AX 2 , we need to to be able to evaluate In det X after we update X 
as X + hxAXi + h 2 AX 2 ). Computation of In det X is not trivial because, 
unlike the objective function or the linear constraints, the value of In det X 
does depend on the entries outside the aggregated sparsity pattern F. We 
cover an efficient algorithm to compute In det X in this section. 

Consider a partial symmetric matrix X with sparsity pattern F, a chordal 
extension of E. Using the factors given in (jSJ), the value of In det X can be 
evaluated efficiently as follows. Because each L r (r = 1, 2, . . . , I — 1) is unit 
lower triangular, its determinant is one. The determinant of the block diago- 
nal matrix D is the product of the determinants of each of its diagonal blocks 
D Sr s r , r = 1,2,..., I. Observe that D SrSr = X Sr s z - X SrUr X^ Ur X UrSr (r = 
1,2,...,/ — !) in ((7|) is the Schur complement of Xu r u T in 



QXc r CrQ 

for some permutation matrix Q. The determinant of the Schur complement 
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is 



det(D SrSr ) 



det(QX CrCr Q T ) 



det(X UrUr ) 

det{X CrCr ) 
det(X UrUr y 



for r 



1,2,. ..,1-1 



Therefore, 



In det X 



In det (PXP T ) 



( U l r=idet(X CrCr ) \ 
\U l ~=\det(X UrUr ) J 



i 1-1 



(lndetX CrCr ) - (^detX UrUr ) . 



(16) 



r=l r=l 



Note that performing Cholesky factorization of a positive definite matrix 
with sparsity pattern F requires 0(X^Li s i )■> where s, = maxcvsi |C r fl {i, % + 
1, . . . , n}|, assuming (1, 2, ... , n) is a perfect elimination ordering. On the 
other hand, computing In det X by straightforward application of that 
is, by computing determinants of each X Cr c r an d X UrUr separately, requires 
0(Y^ l r= x \C r \ 3 ) operations. Notice that the time required to perform Cholesky 
factorization of a positive definite matrix with sparsity pattern F is the lower 
bound of the computation time of In det X, which occurs when X — X. For 
this reason, we seek to find an algorithm that computes In det X in the same 
order of complexity as that of performing Cholesky factorization on the same 
sparsity pattern. 

In the most favorable case where none of the maximal cliques over- 
lap, straightforward application of (|16|) has the same time complexity as 
that of Cholesky factorization. To see the equivalence of the two algo- 
rithms' complexity in this case, note that OQ^ILi s l ) = C(Sr=i J2iec r s ?) = 
0(^21=1 |Cr| 3 )- An example of such case is when X is block diagonal. 

We consider the efficiency of straightforward calculation of (fT^|) in the 
case that the sparsity pattern graph G = (V, E) is planar next as this special 
case arises often in practice. Our analysis assumes that the vertices of G are 
ordered according to the nested dissection ordering. Lipton et al. introduce 
generalized nested dissection and show that performing Cholesky factoriza- 
tion on said ordering requires 0(n 3 / 2 ) operations, where n is the number of 



18 



vertices ^J]- Planar graphs satisfy a \/n- separator theorem, which states 
that the vertices of the graph G can be partitioned into three sets A, B„ and 
C such that there are no edges having one endpoint in A and the other in 
B, \A\, \B\ < |n, and |C| < \^8n. Nested dissection ordering is computed by 
partitioning V into A, B, and C according to the separator theorem, num- 
ber the unnumbered vertices in C such that they are eliminated after the 
unnumbered vertices in A and B, and then recursively number the unnum- 
bered vertices in A U C and BUC. The recursion stops when the number of 
vertices under consideration is less than 72, at which point, the unnumbered 
vertices are numbered arbitrarily. It is shown in Lipton et al. that, for a 
given A, B, and C in any level of the recursion, no vertex in A is adjacent to 
any vertex in B in the chordal extension graph G'. Therefore, any maximal 
clique of G' can contain at most the vertices in the separator C of each recur- 
sion hierarchy and additional 72 vertices from the lowest level of recursion. 
Since each recursion reduces the number of vertices to at most |n', where 
n' is the number of vertices in consideration of the current level, and the 
separator has at most V8n' vertices, the number of vertices in any maximal 




of maximal cliques is no greater than n. Therefore, straightforward calcula- 
tion of In det A requires 0(n(y/n) 3 ) = 0(n 5 ^ 2 ) operations, which is greater 
than that of Cholesky factorization by a factor of n. 



As seen from the planar case, straightforward computation of In det A 
can be significantly more expensive than Cholesky factorization. Another 
common case that suffers from the same problem is when A is a banded 
matrix. A banded matrix with bandwidth p satisfies the property that the 
entry A^- = if \i — j\ > p. Performing Cholesky factorization on such 
a matrix takes 0{np 2 ) operations. To analyze time complexity of In det A 
computation, first notice that (1,2, . . . ,n) is a perfect elimination ordering 
and that the sequence of maximal cliques {C 1; C 2 , . . . , C n _ p }, where C r = 
{r, r + 1, . . . , r + p} (r = 1, 2, . . . , n — p), satisfies the running intersection 
property (@J). Therefore, straightforward computation of In det A requires 
0(J2l = i |CV| 3 ) = 0((n — p)(p + l) 3 ) = 0(np 3 ) operations, which is greater 
than 0(np 2 ) operations of Cholesky factorization. 

However, it is possible to reduce the complexity of computing In det A to 
0(np 2 ) operations in the banded matrix case by using the following idea. The 
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determinant of each (positive definite) submatrix Xc r c r an d Xjj r jj r is usually 
computed from the product of diagonal entries of its Cholesky factor. If a set 
C r (resp. U r ) shares many members with another set C r > (resp. U r r), r ^ r', 
the Cholesky factor of a symmetric permutation of Xc r ,c r , (resp. Xjj r ,u r ,) 
can be constructed from the Cholesky factor of Xc r c r (resp. Xu r u r ) or vice 
versa, which is more efficient than computing Cholesky factor of Xq ,c , 
(resp. Xjj ,u ,) from scratch. In the banded matrix case, any two adjacent 
cliques C r and C r+ \ (r = 1, 2, . . . , n — p — 1) share the same p — 1 elements 
(C r fl CV+i = {r+l,r + 2,...,r +p}). The same can be said about adjacent 
UrS. Observe that U r = {r + 1, r + 2, . . . , r + p} (r = 1, 2, . . . , n — p — 1) and 
U n -p = 0- Therefore, the two adjacent U r and U r+ i (r = 1, 2, . . . , n — p — 2) 
share p — 2 elements. 

The process of updating a Cholesky factor is as follow. Let L r be the 
Cholesky factor of Xc r c r - Remove the first row of L r , which corresponds to 
the rth row/column of X, and let L r be the resulting {p — 1) x p submatrix. 
We then transform Lj to a (p — 1) x (p — 1) upper triangular matrix i? by 
performing Givens rotations to zero out the p — 1 entries below the main 
diagonal of Lj. Notice that the columns of R corresponds to the (r + l)th, 
(r+2)th,. . . ,(r+p— l)th columns of X. Therefore, R T is exactly the first p— 1 
rows of the Cholesky factor L r+ i of X Cr+1 c r+1 - The final row of L r+1 , which 
corresponds to the (r+p)th column of X, can be computed straightforwardly 
given the other rows of L r+1 and X Cr+1 c r+1 - The same technique can be 
repeated to construct the Cholesky factor of Xc r+2 c r+2 from L r+ i and so 
on. This technique computes L r+ i in 0{p 2 ) operations (as opposed to 0(p 3 ) 
operations if L r+ i is computed from scratch) and hence reduces the total 
time to compute IndetX to 0{np 2 ) operations, which is the same order as 
the complexity of Cholesky factorization. Also note that, incidentally, R T 
is the Cholesky factor of X UrUr . This coincidence does not always occur in 
general case. 

This Cholesky updating process is not limited to the banded matrix case. 
In general, given the Cholesky factor L r of X Cr c r (resp. X UrUr ), the Cholesky 
factor of a symmetric permutation of Xc r ,c r , (resp. Xu ,u ,) can be con- 
structed by removing the rows of L r corresponding to C r \C r r (resp. U r \U r r), 
performing Givens rotation to transform its transpose into an upper trian- 
gular matrix, and then appending the rows corresponding to C r i \ C T (resp. 
U r > \ U r ). The resulting matrix may not be the Cholesky factor of X Cr ,c r , 
(resp. Xu r ,u r ,) but rather of some symmetric permutation of it because the 
rows corresponding to C r i \ C r (resp. U r i \ U r ) are always appended to the 
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bottom. Since permuting a matrix symmetrically does not affect its determi- 
nant, the resulting Cholesky factor can be used for determinant computation 
as is. 

Roughly speaking, the above technique is more efficient the smaller \C r \ 
C r / 1 and | C r i \ C r | are. Larger \C r \ C r >\ usually implies more Givens rota- 
tions while larger \C r > \ C r \ implies more computation of the entries of the 
appending rows. However, if the rows to be removed are the bottom rows of 
L r , no Givens rotations are required (as the resulting matrix remains lower 
triangular). Therefore, large \C r \C r >\ does not imply many Givens rotations 
in this case. 

Hence, we are able to compute In det X "optimally" (in the sense of within 
the same order of complexity as performing Cholesky factorization) in two 
special cases: block diagonal and banded matrices. However, there are cases 
where it seems In det X cannot be computed "optimally" using Cholesky up- 
dating scheme, for example, the planar graph case. It is this reason that pre- 
vents us from using the dual algorithm described in Section El to compute the 
primal projected Newton direction as it requires evaluation of (V 2 In det X)P 
in each iteration of the conjugate gradient and evaluating (V 2 lndetX)P is 
generally at least as expensive as evaluating In det X. 

6.2 Computation of X~ l and (V 2 lndet X)P 

As mentioned in Section EJ computing N in step (ii) of DPD requires the 
knowledge of X^ 1 = VlndetX. In addition, steps (iii), (iv), and (ix) also 
call for VlndetX and (V 2 In det X)iV in the formula for AS\ and in the 
steepest descent direction, respectively. From (|16J) . the matrix X -1 is seen 
to be 

X- 1 = -i In det A" 
dX 

= f (sr - £ (^m**^) . (17) 

Recall that Xc r c r ( r — 1, 2, . . . , Z) and Xu r u r {r — 1> 2, • • • , 1) are completely 
dense. For this reason, using automatic differentiation would not yield a 
more efficient first-derivative computing algorithm than simply computing 
-~ In det Xc r c r an d -77 In det Xjj r u r conventionally (by finding their inverses) 
and piecing them together according to (|17|) . The same is true with the 
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product of the second derivative and an arbitrary matrix 

(^lnde t X)P< = f (^lndetX CA ) rJ£ (^Jlndet.?^) P. 

r=l x ' r=l ' 

Recall that (V 2 In det W) P' = —W'^^P'W' 1 . Therefore, our algorithm com- 
putes -iTT In det X by the simple algorithm described above. The computation 
of (-gsy In det X)P' is also handled similarly: by computing the product of the 
second derivative of each dense submatrix and P' conventionally and then 
piecing them together. 

The Cholesky updating technique as described in Section 16.11 can also 
be applied to the computation of ^lndetX and (-7^2 In det X)P'. Both of 
these computations involves computing the Cholesky factor of each clique in 
order to compute its inverse. Hence, the same technique can be applied to 
reduce the computation time required to find the Cholesky factors. 

7 Estimates of time and space complexities 

We give estimates of time and space complexities of our algorithm in this 
section. Let 0(Time(L F )) and 0(Space(L^)) denote the time and space 
complexity of Cholesky factorizing a matrix with sparsity pattern F, respec- 
tively. Steps (i) and (v) do not require any computation. Step (ii) involves 
a conjugate gradient to solve (JTSJ). Each iteration of the conjugate gradient 
requires one evaluation of the Hessian- vector product (V 2 g(0))A, which is 
0(Time(Li?)) and 0(Space(Li?)). The conjugate gradient takes at most n it- 
erations to converge. Step (iii) requires one evaluation of (V 2 In det X)N. 
Step (iv) requires one evaluation of VlndetX and (V 2 lndetX)iV each. 
Step (vi) involves a conjugate gradient that takes one evaluation of the 
Hessian-vector product (V 2 /i(y))z and therefore requires 0(Time(Lp)) and 
0(Space(Li?)) per conjugate gradient iteration. The conjugate gradient also 
takes at most n iterations to converge. Step (vii) requires one evaluation of 
the entries in F of (V 2 In det S)N, which is 0(Time(L F )) and 0(Space(L^ )). 
Step (viii) calls for one evaluation of the entries in F of V In det S and 
(V 2 In det S)N each. Finally, step (ix) requires one evaluation of In det X 
per iteration of steepest descent. 

A few steps in the algorithm, namely steps (iii), (iv), and (ix), require 
evaluation of either In det X or one of its derivatives. Generally, evaluating 
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In det X or its derivative is more expensive than 0(Time(Lp)). Fortunately, 
the computational results in Section |S] show that only a constant number of 
steepest descent iterations are needed to find good step size, and steps (iii) 
and (iv) only require at most two evaluations of such quantities. Space com- 
plexity of an evaluation of In det X, on the other hand, is still 0(Space(LF)) 
as evaluating In det X reduces to computing the Cholesky factor of each of 
the maximal cliques without having to store the Cholesky factor of more 
than one clique at a time. For the case where G = (V, E) is planar and 
F is its chordal extension when the vertices V are ordered in nested dissec- 
tion ordering, computing In det X (and, consequently, each iteration of the 
steepest descent method) requires 0(n 5 / 2 ) operations and O(nlogn) space. 
Notice that 0(n 5//2 ) operations for these steps are acceptable as each of the 
conjugate gradient takes 0(Time(Li?)) = 0(n 3 ^ 2 ) operations per iteration 
and at most n iterations to converge, resulting also in 0(n 5 ^ 2 ) operations for 
steps (ii) and (vi). 

As the last remark for this section, we note that we suspect that this 
estimate of 0(n 5//2 ) for planar case may not be tight. For the special case 
that G is a grid graph, it is not hard to show that computing In det X requires 
only 0(n 3 / 2 ) operations, which is the same order as performing Cholesky 
factorization. 

8 Computational results 

We implemented and tested our algorithm by using it to solve various in- 
stances of the problem of finding maximum cut (MAX-CUT). The procedure 
of using SDP to solve MAX-CUT is proposed by Goemans and Williamson 
in 1995 [7j. Readers are referred to Goemans and Williamson's paper for the 
details of the procedure. In MAX-CUT, the input graph whose maximum 
cut is sought is exactly the aggregated sparsity pattern E of the resulting 
SDP program. 

Given the aggregated sparsity pattern E, we find its chordal extension 
by ordering the vertices of E according to the symmetric minimum degree 
ordering |14| . perform symbolic Cholesky factorization on the reordered ma- 
trix, and use the resulting Cholesky factor as the chordal extension. The 
primal partial variable is initialized to the identity matrix. The dual variable 
is initialized to C — I after C has been reordered according to the minimum 
degree ordering. After the algorithm finds an iterate whose duality gap is 
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n 


m 


Trials 


Time (s) 


Num Iter 


AXi CG 


AS 2 CG 


Pot Min 


5 


7 


100 


1.30 


13.8 


3.8 


3.8 


2.0 


10 


16 


100 


5.43 


15.8 


8.8 


9.2 


2.2 


20 


40 


50 


25.12 


17.9 


14.9 


15.4 


2.4 


50 


75 


10 


124.33 


21.9 


24.9 


26.1 


3.6 


100 


180 


5 


894.44 


25.2 


35.3 


37.0 


4.9 



Table 1: Summary of results of the algorithm on random instances. From 
left to right, the columns are the number of vertices, the number of edges, 
the number of trials run, the average CPU time in seconds, the number of 
main iterations, the average number of conjugate gradient iterations required 
to compute the search direction AX\ for one point, the average number of 
conjugate gradient iterations required to compute AS2 for one point, and 
the average iterations to minimize potential along the four directions for one 
starting point (step (ix) in the algorithm). 

less than 10~ 3 , it continues for 3 additional iterations and then terminates. 
Each conjugate gradient runs until ||r||2 is less than 10~ 5 times the 2-norm 
of the constant term of the system that the algorithm is trying to solve. 
Finally, step (ix) of the algorithm is implemented using the method of steep- 
est descent starting from four initial points (X + AXi,S), (X + AX 2 ,S), 
(X, S + ASi), and (X, S + AS2) separately (Refer to chapter 6.5.2 of [TU] for 
the explanation of the method of steepest descent). We do not perform any 
line searches in the steepest descent; we simply take the step size to be iden- 
tically one and take the step as long as the new point decreases the potential. 
Line searches are ignored because, according to our testing, performing line 
searches does not generally improve computation time of the algorithm. The 
additional evaluation of In det X in each step of the line searches appears to 
be too expensive compared to the extra decrease in potential resulted from 
them. 

The test instances were generated by adding edges to the graph randomly 
until the chosen number of edges were met. The number of main iterations 
reported in column 5 of Tabled is the number of times the algorithm repeats 
step (i) to (x) before it finds an optimal solution is found. 

The results in Table Q show that the conjugate gradient to compute N, 
the matrix necessary for computation of AX\, requires more number of iter- 
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n 


m 


Num Trials 


Time (s) 








4 directions 


2 directions 


5 


7 


100 


1.30 


1.23 


10 


16 


100 


5.43 


7.58 


20 


40 


50 


25.12 


37.40 


50 


75 


2 


124.33 


191.46 



Table 2: Comparison of the average CPU time (in seconds) the two algo- 
rithms required to solve the random instances. 

ations than to compute N, the matrix necessary for computation of AS2. In 
addition, the potential minimization by the method of steepest descent only 
takes a few iterations for each initial point and therefore does not steal away 
too much valuable time that could be spent on other computations. 

The two directions AX2 and AS± are not common in literature although 
the other two directions AXi and AS2, which are the projected Newton 
directions, appear in many other SDP algorithms. We performed an exper- 
iment to test whether using all 4 directions are more beneficial than using 
only 2 more common directions. We tested the two versions of our algorithms 
on the random instances generated in the same manners as the ones in the 
previous experiment. The results are shown in table El 

Table |2 shows that using all four directions make the algorithm find the 
optimal solution in shorter time in all test cases. The reason toward this 
result is that all of the quantities involved in computation of the two uncom- 
mon directions are also required to compute the other two projected Newton 
directions. Therefore, computation of the additional two unusual directions 
is relatively cheap compared to the reduction in potential they induce. 

Finally, we tested the computation of In det X with Cholesky updating 
scheme on banded matrices to verify its 0(np 2 ), or more precisely, 0((n — 
p)p 2 ) complexity. We began by fixing the bandwidth p to be 3 and varying 
n from 6 to 40, repeated 500 times for each n. Figure d shows the plot of the 
average CPU time to compute In det X for banded matrices with bandwidth 
3 of various size against the number of vertices, and it confirms the linear 
dependency on n of the complexity. Next, we fixed n — p to 10 and varying 
p from 1 to 40, repeated 50 times each. The plot of the average CPU time 
against the square of the bandwidth for this experiment is shown in figure 121 
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Figure 1: Average CPU time to compute IndetX in the case that X is a 
banded matrix. Bandwidth is fixed to 3 while varying the number of vertices. 

The plot agrees that the complexity of In det X is proportional to p 2 in the 
banded case. 

9 Concluding Remarks 

We showed an implementation of a SDP solver that exploits sparsity in the 
data matrices for both primal and dual variables. Our algorithm is based 
on the primal-dual potential reduction method of Nesterov and Nemirovskii 
and uses partial primal matrix variable as proposed by Fukuda et al. Two 
of the search directions are projected Newton directions that can be found 
by solving linear systems involving the gradient and the Hessian of the loga- 
rithm of the determinant of a matrix with respect to a vector. We observed 
that the idea from reverse mode of automatic differentiation can be applied 
to compute the mentioned gradient and the product of the Hessian and an 
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Figure 2: Average CPU time to compute IndetX in the case that X is a 
banded matrix. The quantity n—p is fixed to 10 while varying the bandwidth. 
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arbitrary vector efficiently, which is in the same order as computing determi- 
nant of a sparse matrix. Using this observation, we solve the linear system 
for the search directions by conjugate gradient, which requires one evaluation 
of the product of the Hessian and a vector in each iteration in exchange for 
not having to factorize the Hessian matrix. For the primal case, we pro- 
pose a way to compute one of the primal search directions without requiring 
the determinant of the positive definite completion in each iteration of the 
conjugate gradient because the determinant of such completion is generally 
more expensive than performing Cholesky factorization. This determinant 
is still required in the potential minimization to find step sizes as well as 
in the course of computing one other search direction, we described a tech- 
nique to reduce the complexity of computing the logarithm of the determi- 
nant of a positive definite matrix completion by reusing the Cholesky factors 
when there are many overlaps of maximal cliques. This technique reduces 
the complexity to that of performing Cholesky factorization in the banded 
matrix case but still cannot achieve the same complexity as the Cholesky fac- 
torization in general. Fortunately, only a few number of evaluations of the 
determinant of such completion is required per one SDP iterate. The other 
two non-Newton directions can be computed efficiently since they require 
the same quantities that are already computed in the process of finding the 
former projected Newton directions. We then tested our algorithm on ran- 
dom instances of the MAX-CUT problems. From the results, the conjugate 
gradients do not require too many iterations to converge for the algorithm 
to be impractical. 

There are questions unanswered in this paper that can help improve the 
algorithm described here. For example, can we compute the logarithm of 
the determinant of a positive definite matrix completion more efficiently, 
perhaps by another derivation different from (jSJ)? The other issue is regarding 
the stability. How can we incorporate preconditioning to mitigate the ill- 
conditioning of the linear system problems? Regardless, our algorithm should 
prove efficient in the applications where the data matrices are sparse. 
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